Speed up crsql_changes merges ~8x#7
Merged
Conversation
Per merged change, the insert path paid a full prepare/finalize cycle, several redundant statement executions and - worst of all - three hot statements used RETURNING, which makes SQLite materialize an ephemeral btree (with its own pager and page cache) on every executed row. - Drop RETURNING from winner clock, pk lookaside, site_id ordinal and backfill inserts; read values from bound params or last_insert_rowid - Bind unpacked pks directly to the key lookaside statements instead of round-tripping through a temporary 'SELECT ?,?,...' prepare per row - Toggle the sync bit through a direct pointer instead of executing SELECT crsql_internal_sync_bit(x) statements per change - Memoize (table, pk) -> (lookaside key, causal length) of the last merged row; changesets are ordered by (db_version, seq) so the N column changes of a row hit the memo back-to-back. Invalidated on commit/rollback, savepoint rollback (new vtab xRollback/xRollbackTo, module iVersion 2), local CRR writes, table info reloads, compaction and on any merge error (statement journal undo) - Memoize the last site_id -> ordinal resolution (changesets are virtually always single-site) - Compute next_db_version in C, bound as a value, and skip the PRAGMA data_version probe when pendingDbVersion is already set for the open transaction Import of 80k changes (20k rows): 0.94s -> 0.11s. Idempotent re-import: 0.18s -> 0.06s. Adds a savepoint-rollback regression test (with NDEBUG-immune asserts) and test/perf/bench-import.sh.
A site's col_version is monotonic per cell, so an incoming change whose col_version equals the local clock entry AND whose site_id matches the entry's author is the identical change: reject it without reading the local value from the data table. The site ordinal rides along in the col_version select (same clock row, no extra cost) and the incoming site's ordinal resolves through the existing site_id memo. True concurrent edits (equal versions from different sites) still fall through to the deterministic value comparison, verified by an explicit two-peer convergence check and the python correctness suite. Also folds the duplicated ordinal lookup in set_winner_clock into the shared lookup_site_ordinal helper. Idempotent re-import of 80k changes: 0.063s -> 0.035s (0.179s before this PR).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Importing CRDT diffs (
INSERT INTO crsql_changes) was slow enough that apps resorted to hand-rolled batching hacks. Profiling showed that per merged change the extension paid a fullsqlite3_prepare/finalizecycle, ~7 redundant statement executions, and — the dominant cost — three hot statements usedRETURNING, which makes SQLite materialize an ephemeral btree with its own pager + page cache on every executed row (visible as massive sys-time / VM churn). The clock-tableRETURNING keyliterally echoed back a value bound as parameter 1.What
RETURNINGfrom all hot-path statements — winner clock (use the bound key), pk-lookaside inserts (__crsql_keyis a rowid alias →sqlite3_last_insert_rowid), site-id ordinal insert, and theas_crrbackfill pathcrsql_get_or_create_key_packed()binds unpacked PKColumnValues directly into the cached key statementsint*(ExtData.syncBitPtr) instead of steppingSELECT crsql_internal_sync_bit(x)twice per change(table, pk) → (lookaside key, causal length): changesets arrive ordered by(db_version, seq), so the N column changes of one row hit it back-to-back. Invalidated on commit/rollback hooks, savepoint rollback (new vtabxRollback/xRollbackTo, module iVersion 2), local-write trigger entry, table-info reloads,compact_post_alter, and any merge error (statement-journal undo)crsql_next_db_versioncomputed in C and bound as a value; skips thePRAGMA data_versionprobe whenpendingDbVersionis already established in the open transactionResults (Release, Apple Silicon, single transaction)
Remaining profile is genuine btree work (column upsert + clock insert).
Testing
testRowMemoSavepointRollbackcovering memo invalidation acrossROLLBACK TO(written with NDEBUG-immune checks, sinceassertis compiled out in Release)make asancurrently hangs in the ASAN runtime's own init on recent Xcode — pre-existing, unrelated)db_version > ?) exchanges → byte-identical peerscore/test/perf/bench-import.shadded for repeatable measurement